Dobre Bogdan-Mihai, Moldovan George, Mocanu Alexandru
07 decembrie 2020
Reprezentarea datelor sub forma de retea e o abordare diferita fata de modul clasic de reprezentare, cel tabelar. Acest format poate evidentia noi caracteristici ale datelor si imbunatesteste vizualizarea acestora intr-un mod semnificativ.
Pentru a realiza o analiza originala, setul de date folosit reprezinta structura unei retele de trafic cu tigari de contrabanda din Romania, datele fiind extrase dintr-un dosar penal. Astfel, utilizand metode de graph mining vom realiza o analiza asupra structurii retelei, dar si asupra rolurilor individuale ale membrilor.
Datele au fost modelate folosind un obiect de tip reţea din librăria statnet. Legăturile dintre noduri au fost introduse folosind o lista de muchii, iar nodurile au următoarele atribute : nume, nume abreviat si rol.
netmat <- rbind(c(1,2),
c(1,3),
c(2,3),
c(1,4),
c(5,6),
c(7,8),
c(5,7),
c(5,8),
c(5,9),
c(6,7),
c(6,8),
c(6,9),
c(5,10),
c(6,10),
c(11,12),
c(11,13),
c(13,14),
c(14,19),
c(13,19),
c(14,1),
c(19,15),
c(19,16),
c(19,17),
c(19,18),
c(12,15),
c(12,16),
c(12,17),
c(12,18),
c(20,8),
c(20,9),
c(21,8),
c(21,9),
c(3,8),
c(3,9),
c(1,8),
c(1,9))
net <- network(netmat, matrix.type="edgelist")netmatsym <- symmetrize(as.sociomatrix(net), rule ="weak")
netsym <- network(netmatsym, matrix.type="adjacency")
network.vertex.names(netsym) <- c("B***cu L***na",
"B***cu An***us",
"B**scu C***nel",
"B**hiu G***ge",
"M**tu M**na",
"Ma**u I***he",
"T**a F**p",
"T**a G***ghe",
"S**m An**la",
"G**ca G****ghe",
"C**u I**n",
"M***u L**do",
"D**a D**a",
"D**a C**l",
"N**cu P**u",
"N**se T**er",
"S***an C***tin",
"O***u A**ei",
"D**a I***l",
"P**ci V***e",
"D***mir R**a")
set.vertex.attribute(netsym, "role", c("C",
"C",
"C",
"CR",
"C",
"C",
"CT",
"CT",
"CT",
"C",
"C",
"A",
"A",
"C",
"C",
"C",
"C",
"C",
"CT",
"D",
"D"))
# C : Comerciant, CR : Cartita, CT: contrabandist, A: aducator clienti, D: depozitare
set.vertex.attribute(netsym, "abrev_name", c("BL",
"BA",
"BC",
"BG",
"MM",
"MI",
"TF",
"TG",
"SA",
"GG",
"CI",
"ML",
"DD",
"DC",
"NP",
"NT",
"SC",
"OA",
"DI",
"PV",
"DR"))
netsym %v% "alldeg" <- degree(netsym)
namelab <- get.vertex.attribute(netsym, "vertex.names")
rolelab <- get.vertex.attribute(netsym, "role")
abrevnamelab <-get.vertex.attribute(netsym, "abrev_name")
my_pal <- brewer.pal(5,"Dark2")
rolecat <- as.factor(get.vertex.attribute(netsym,"role"))plot(netsym,
main = "Infractional network",
usearrows=FALSE,
mode="fruchtermanreingold",
vertex.col = my_pal[rolecat],
label=rolelab,
displaylabels=T,
vertex.cex = 1.5)O prima analiza asupra retelei este realizarea rezumatului in 5 puncte. Functiile prezente in libraria statnet faciliteaza realizarea acesteia. Analizand aceste valori, putem avea o prima impresie despre structura retelei si despre modul de organizarea a acesteia.
## [1] "BASIC CHARACTERISTICS"
## Network attributes:
## vertices = 21
## directed = TRUE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges = 72
## missing edges = 0
## non-missing edges = 72
## density = 0.1714286
##
## Vertex attributes:
##
## abrev_name:
## character valued attribute
## attribute summary:
## the 10 most common values are:
## BA BC BG BL CI DC DD DI DR GG
## 1 1 1 1 1 1 1 1 1 1
##
## alldeg:
## numeric valued attribute
## attribute summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 4.000 6.000 6.857 10.000 14.000
##
## role:
## character valued attribute
## attribute summary:
## A C CR CT D
## 2 12 1 4 2
## vertex.names:
## character valued attribute
## 21 valid vertex names
##
## No edge attributes
## [1] "Size:"
## [1] 21
## [1] "Density:"
## [1] 0.1714286
## [1] "Components:"
## [1] 1
## [1] "Diameter:"
## [1] 7
## [1] "Transitivity:"
## [1] 0.25
Folosind atributele definite în momentul creării, putem filtra reţeua astfel încat putem evidenţia importanţa unui anume rol. Spre exemplu, dacă am păstra în reţea doar Comercianţii, putem observa că aceştia sunt în mare partea izolaţi, distrugând aspectul de reţea compactă. Acest lucru evidenţiază rolul contrabandiştilor în reţea, aceştia asigurând practic conexitatea reţelei.
print("Filtering networks")
print(get.vertex.attribute(netsym, "role"))
comercianti <- get.inducedSubgraph(netsym, which (netsym %v% "role"=="C"))
gplot(comercianti,displaylabels=TRUE, main="Comercianti")delete.vertices(comercianti, isolates(comercianti))
gplot(comercianti, displaylabels = TRUE, main="Grupuri de comercianti")gplot(netsym,gmode="graph",edge.col="grey75",displaylabels=T,
vertex.cex=1.5,mode='circle',main="circle")gplot(netsym,gmode="graph",edge.col="grey75",displaylabels=T,
vertex.cex=1.5,mode='eigen',main="eigen")gplot(netsym,gmode="graph",edge.col="grey75",displaylabels=T,
vertex.cex=1.5,mode='random',main="random")gplot(netsym,gmode="graph",edge.col="grey75",displaylabels=T,
vertex.cex=1.5,mode='spring',main="spring")gplot(netsym,gmode="graph",edge.col="grey75",displaylabels=T,
vertex.cex=1.5,mode='fruchtermanreingold',main='fruchtermanreingold')gplot(netsym,gmode="graph",edge.col="grey75",displaylabels=T,
vertex.cex=1.5,mode='kamadakawai',
main='kamadakawai')library(network)
library(intergraph)
library(igraph)
library(networkD3)
plot(netsym,vertex.cex=0.5,main="Too small nodes")sidenum <- 3:7
rolecat <- as.factor(get.vertex.attribute(asIgraph(netsym),"role"))
plot(netsym,usearrows=FALSE,vertex.cex=4, main="Different node type",
displaylabels=F,vertex.sides=sidenum[rolecat])n_edge <- network.edgecount(netsym)
linecol_pal <- c("blue","red","green")
edge_cat <- sample(1:3,n_edge,replace=T)
plot(netsym,vertex.cex=1.5,vertex.col="grey25", main="Edge coloring example",
edge.col=linecol_pal[edge_cat],edge.lwd=2)n_edge <- network.edgecount(netsym)
edge_cat <- sample(1:3,n_edge,replace=T)
line_pal <- c(2,3,4)
gplot(netsym,vertex.cex=0.8,gmode="graph", main="Different edge type",
vertex.col="gray50",edge.lwd=1.5,
edge.lty=line_pal[edge_cat])my_pal <- brewer.pal(5,"Dark2")
rolecat <- as.factor(get.vertex.attribute(asIgraph(netsym),"role"))
plot(netsym,
main = "Infractional network",
usearrows=FALSE,
mode="fruchtermanreingold",
vertex.col = my_pal[rolecat],
label=abrevnamelab,
displaylabels=T,
vertex.cex = 1.5)
legend("bottomleft",legend=c("Aducator clienti","Comerciant","Cartita","Contrabandist","Depozitare"),
col=my_pal,pch=19,pt.cex=1.5,bty="n",
title="Criminal Role")#Visnetwork
library(visNetwork)
inetsym_edge <- get.edgelist(inetsym)
inetsym_edge <- data.frame(from = inetsym_edge[,1],
to = inetsym_edge[,2])
inetsym_nodes <- data.frame(id = as.numeric(V(inetsym)))
visNetwork(inetsym_nodes, inetsym_edge, width = "100%")
Descoperirea subgrupurilor dintr-o retea
Retele sociale: tendinta de separare a actantilor in subgrupuri puternic interconectate, ale caror legaturi cu membrii din afara subgrupului sunt in numar relativ mic
Subgrupuri: submultimile de noduri ce contin cat mai multe conexiuni directe
Componenta ce ofera informatii despre puterea unei legaturi (spre exemplu, intr-o retea de socializare - frecventa de interactiune: zilnic, saptamanal, lunar, ocazional) poate fi integrata in determinarea subgrupurilor
Clique: Sunt considerate subgrupuri doar submultimile care au conexiune intre oricare doua noduri (extragerea de subgrafuri complete)
Observatii:
putem sa consideram o dimensiune minima pentru un subgrup, de regula 3 noduri;
“clique maximal” (care nu mai poate fi extins prin adaugarea altor noduri).
# Determine the maximum size of a clique
clique.number(inetsym)
# Determine all the cliques
cliques(inetsym, min=3)
# Determine the maximal cliques
maximal.cliques(inetsym, min=3)
# Determine the cliques with maximum size
largest.cliques(inetsym)
Constrangerea de a avea legaturi intre oricare doua noduri dintr-un subgrup este adesea prea puternica. Astfel, a fost introdusa notiunea de “k-Core”: daca extragem un k-Core, toate nodurile trebuie sa aiba gradul cel putin egal cu k.
Modularitate:
Pentru a compara diferite moduri de a imparti o retea in subgrupuri
Statistica ce poate fi calculata efectuand scaderea dintre probabilitatea ca alegand o legatura din retea aceasta sa fie intre noduri din acelasi subgrup si aceeasi probabilitate calculata intr-o retea formata din aceleasi noduri, dar in care muchiile sunt distribuite in mod aleator (numarul de muchii din reteaua initiala este conservat).
Vom incerca sa realizam o impartire in subgrupuri a retelei de traficanti folosind drept criteriu rolul fiecarui actant.
Se poate observa la sfarsit ca acest criteriu nu este cel mai potrivit tinand cont de faptul ca modularitatea obtinuta este negativa.
## Computing the modularity of the clusterization method
V(inetsym)$group <- strtoi(V(inetsym)$role)
modularity(inetsym, V(inetsym)$group)## [1] -0.1508488
## Visualizing the network by emphasizing the role of each node
colors <- brewer.pal(5,"Dark2")
V(inetsym)$color <- colors[strtoi(V(inetsym)$role)]
op <- par(mfrow=c(1,1))
plot(inetsym,vertex.color=V(inetsym)$color,vertex.size=20)“Community Detection”: diversi algoritmi care sa exploreze tiparele legaturilor catre exteriorul unui subgrup.
Vom incerca in continuare sa utilizam diferite modele pentru a genera o retea care sa aiba caracteristici similare cu reteaua initiala.
“Random graph model”: reteaua este generata pornind de la un numar specificat de noduri si adaugand, in mod aleator, un numar specificat de muchii.
## Generate a similar network using Erdos-Renyi method (by specify the number of edges)
no_nodes <- length(V(inetsym))
no_edges <- length(E(inetsym))
er_net1 <- erdos.renyi.game(n=no_nodes,no_edges,type='gnm')
## Generate a similar network using Erdos-Renyi method (by specify the probability of having an edge between two nodes)
edge_prob <- no_edges / ((no_nodes-1)*no_nodes)
er_net2 <- erdos.renyi.game(n=no_nodes, edge_prob,type='gnp')
op <- par(mfrow=c(1,3))
plot(inetsym,vertex.label=NA,vertex.size=10)
plot(er_net1, vertex.label=NA, vertex.size=10)
plot(er_net2, vertex.label=NA, vertex.size=10)
Modelul “Small-World”: porneste de la o configuratie prestabilita a muchiilor retelei, urmata de o modificarea a acestora cu o anumita probabilitate.
## Generating a similar network using Small-World Model
avg_degree <- 2*no_edges/no_nodes
ws_net1 <- watts.strogatz.game(dim=1, size=no_nodes, nei=avg_degree/2, p=.25)
ws_net2 <- watts.strogatz.game(dim=1, size=no_nodes, nei=avg_degree/2, p=.5)
ws_net3 <- watts.strogatz.game(dim=1, size=no_nodes, nei=avg_degree/2, p=.75)
## Visualizing the results
op <- par(mfrow=c(2,2))
plot(inetsym,vertex.label=NA,vertex.size=10)
plot(ws_net1, vertex.label=NA, vertex.size=10)
plot(ws_net2, vertex.label=NA, vertex.size=10)
plot(ws_net3, vertex.label=NA, vertex.size=10)Modelele “random graph” si “small-world” nu folosesc proprietati ale gradelor nodurilor din retea.
Modelul “Scale-Free”: intr-o retea empirica, gradele nodurilor nu au o distributie apropiata de medie, existand multe noduri cu grad mic, dar si noduri cu grad considerabil mai mare decat medie (intr-o retea sociala, aceste noduri ar fi persoanele celebre din diferite domenii: actori, sportivi, politicieni, etc.).
## Generate a similar network using Scale-Free Model
b_net <- barabasi.game(no_nodes, directed=FALSE)
## Visualize the result
op <- par(mfrow=c(1,2))
plot(inetsym,vertex.label=NA, vertex.size=10)
plot(b_net,vertex.label=NA, vertex.size=10)
Statistici comparative ale retelelor generate si rezultatele observate in cadrul retelei empirice:
## Name Size Density Avg_Degree Transitivity Isolates
## 1 Erdos-Renyi 21 0.1714286 1.714286 0.20512821 1
## 2 Small world 21 0.1000000 1.000000 0.08333333 1
## 3 Scale-free model 21 0.0952381 0.952381 0.00000000 0
## 4 Empiric network 21 0.1714286 1.714286 0.25000000 0